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Current Research 

We are continuing the work begun in Years 1 (1991 - 1992) and 2 (1992 -1993) and 
reported in our earlier progress reports this year. The thrust of our group continues to be 
the study of on-line fully adaptive algorithms for data compression with real-time parallel 
implementations. Such algorithms are key to NASA applications where high speed is 
required and diverse data sets need to be handled. 

Here we summarize what’s new from what was reported last year. 



• Image Compression: A paper on our basic single-pass adaptive VQ with variable size 
and shaped codebook entries has appeared in the Proceedings of the IEEE. A new paper 
was presented at the 1994 IEEE Data Compression Conference that describes the use 
of KD-trees for a fast serial implementation that can run on a UNIX workstation. In 
addition, this paper describes a number of key improvements to the basic algorithm. 
The Computer Science Department at Brandeis University has recently received a 1 
million dollar grant from the NSF for the purchase of parallel computing equipment; 
part of these funds have already been used to purchase a 4.096 processor MASS- PAR 
machine; the remainder was used to purchase a 16-node SGI Challenge machine. We 
have been conducting experiments with this machine on practical sub-linear parallel 
implementations of the algorithm. 

• Video Compression: Our work on the basic adaptive displacement estimation algo- 
rithm that tracks variable shaped groups of pixels from frame to frame has appeared 
in the same issue of the Proceedings of the IEEE as our work on adaptive image com- 
pression. In addition, we have submitted for journal publication new work on the 
integration of this algorithm into a complete video and image sequence compression 
system. We are in the process of compiling extensive experimental results with the 
system. 

• Parallel Algorithms: Our work on sublinear algorithms for parallel text compression 
has been submitted for journal publication. We have conducted experiments with our 
new approach to sub-linear text compression that closely approximates optimal com- 
pression but is much more practical to implement. Using an extremely simple parallel 
model (a linear array where processors can only talk to adjacent neighbors), we have 
achieved poly-log time and extremely close approximation to optimal compression. As 
parallel computers become more common, algorithms such as this will provide prac- 
tical ways to fully utilize the power of these machine in NASA applications involving 
large amounts of data. 

• Error Propagation: A paper on our basic error resilient algorithm has been submit- 
ted for journal publication. We are continuing our investigation of “error resilient" 
systems, and their application to lossy systems. 


Appendix: As indicated above, the two papers that recently appeared in the Proceedings 
of the IEEE give good summaries of the key work performed under this contract. Attached 
are copies of these papers. 
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Constantinescu and Storer [4), [5] present a new single-pass 
adaptive vector quantization algorithm that learns a codebook of 
variable size and shape entries; they present experiments on a set 
of test images showing that with no training or prior knowledge 
of the data, for a given fidelity, the compression achieved r\pica!!y 
equals or exceeds that of the JPEG standard. This paper presents 
improvements in speed (by employing K-D trees), simpliciry of 
codebook entries, and visual quality with no loss in either the 
amount of compression or the SSR as compared to the original 
full- search version. 

I. Introduction 

Vector quantization is a powerful approach for lossy 
image compression when a good codebook is supplied, but 
the need to have this codebook supplied in advance can 
be a significant drawback. Constantinescu and Storer [-], 
[5] show how to combine the ability of lossless adaptive 
dictionary methods to process data in a single pass with 
the ability of vector quantization accurately to approximate 
data. For a given overall fidelity of the decompressed image, 
the compression achieved by this new approach typically 
equals or exceeds the JPEG standard. In addition, it often 
outperforms traditional trained VQ (even in the best case, 
where the codebook is specifically trained for the type of 
data being compressed) while at the same time having a 
number of additional advantages: First, it is a single-pass 
adaptive algorithm (requiring no codebook to be provided 
in advance). Second, one can provide precise guarantees 
in advance on the distortion of any l x l subblock of the 
image (whereas trained VQ simply finds the best m 2 tch 
to an available codebook). Third, with a fixed codebook 
size, one can continuously vary' the fidelity/compression 
tradeoff (whereas trained VQ typically achieves different 
tradeoffs by employing multiple codebooks). Our algorithm 
also enjoys some of the advantages of trained VQ, such as 
fast table-lookup decoding. 

Manuscript received November 1, 1993: revised January 15. 199-4. 

The authors are with the Department of Computer Science. Brandtis 
University. Waltham, MA 02254 USA. 
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This paper presents improvements in speed, simpliciry 
of codebook entries, and visual quality with no loss in 
either the amount of compression or the signal-to-noise ratio 
(SNR) as compared io the original full-search version. Sec- 
tion II reviews the basic single-pass adaptive VQ algorithm 
presented in Constantinescu and Storer [4], [5], Section 
111 presents a k-d tree implementation of the dictionary 
that greaily improves the speed of serial implementations 
with no loss in either the amount of compression or the 
SNR as compared io the original full search version. In 
fact, due to a minor improvement in the basic algorithm 
(see the end of Section II), the experiments reported here 
improve upon what is reported in Constantinescu and Storer 

[4] , [5]. Section IV presents a new learning heuristic that 
employs only square-shaped entries. Section V presents a 
new method for distortion computation that improves visual 
quality without any significant sacrifice in the SNR. Section 
VI mentions some current areas of research. 

II. The Basic Single-Pass adaptive vq algorithm 

In this section we review the work presented in [4], 

[5] . As mentioned in the Introduction, one can view this 
approach as combining ideas from adaptive lossless com- 
pression and from vector quantization. 

With lossless adaptive dictionary methods , a local dictio- 
nary D is used to store a constantly changing set of strings. 
Data are compressed by replacing substrings of the input 
stream that also occur in D by the corresponding index into 
D\ we refer to such indices as pointers. The encoding and 
decoding algorithms work in lockstep to maintain identical 
copies of D (which is constantly changing). The encoder 
uses a match heuristic to find a match between the incoming 
characters of the input stream and the dictionary*, removes 
these characters from the input stream, transmits the index 
of the corresponding dictionary entry, and updates the 
dictionary with an update heuristic that depends on the 
current contents of the dictionary and ihe match that was 
just found. If there is not enough room left in the dictionary-, 
a deletion heuristic is used to delete an existing entry*. For 
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Fig. 1. (a) ChestCAT original, (b) ChesiCAT map. (c) CntsiCAT dictionary. 
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Fig. 2. On-line adaptive VQ. 

an overview on adaptive lossless compression, see the book 
by Siorer [14]. 

Vector quantization is a lossy method that compresses 
an image by replacing subblocks by indices into a dictio- 
nary' of subblocks. Traditionally, the subblocks are all the 
same size and shape and the dictionary must be computed 
in advance by “training** on sample data. Not only can 
training be computationally expensive, but “full-search ’ 
encoding that is guaranteed to find the closest vector 
in the dictionary can also be very time-consuming. In 
practice, tree- structured dictionaries are often used. Lin [10] 
studies the performance — complexity tradeoffs for vector 
quantization. See Gersbo and Gray 19] for an introduction 
to vector quantization and references to the literature. 

The basic single-pass adaptive VQ algorithm presented 
in [4], [5] is depicted in Fig. 2, which is followed by 
Alsorithms la and Id. the Lossy Generic encoding and 
Decoding Algorithms for on-line adaptive vector quanti- 
zation. Fig. 1 illustrates the algorithms by showing for 
a CAT- scan chest image (Fig. 1(a)), a map of how the 
compressor covers ihe image with rectangles (Fig. 1(b)), 
and a portion of the dictionary' (Fig. 1(c)) about half- 
way throuch the compression process. The operation of the 
eeneric algorithms is guided by the following heuristics. 

The Growing Heuristic: The heuristic selects one grow- 
ing point GP(:r.y.<?) from the available pool GPP. All 


3) Initialize ih z local dictionary D \o have one entry fo: tach pixel 
of the input alphabet and the growing points pool (GPP) with one 
(or more) growing points. 

2) Repeat until there are no more growing points in GPP: 

a) \ Select the next growing point from GPP:) 

Use a growing heuristic to choose 2 growing point G? <mni 

GPP. 

b) |Gff the best match block b :) 

Use 2 match heuristic to find a Mock b : in D that matches 
with acceptable fidelity imoge <GP . b :) (the portion c. :n 2 ge 
determined by GP having the same si 2 e as l). Transmit 
[)og : ir>n bits for the index of b 

c) 1 1'pdcte D end GPP:) 

Add each of the blocks specified by a dictionary update 
heuristic to D (if D is full, first use a deletion heuristic 
to make space) 

Algorithm la: Lossy Generic Encoding Algorithm. 

1) Unitialize D and GPP by performing Step 1) of the encoding 

algorithm.) „ 

2) Repeat until there are no more growing points in GPP: 

a) i Select the next growing point from GPP:) 

Perform Sicp 2a of the encoding algonthm to obtain G?. 

b) {Get the best match block b: ) 

Receive [logV.Dfl bits for the index b. Retrieve b from U 
and output b at the position determined by GP. 

c) | Update D and GPP:) 

Perform Step 2c of the encoding algorithm 


Algorithm lb: Lossy Generic Decoding Algorithm. 

experiments reported here use the wav* heuristic (2 uaNC 
front” that goes from the upper left comer down 10 ^ 
lower right comer). Other examples of growing heuristic* 
include circular (a "ball” that expands outward from the 
center), diagonal (a successive "'thickening ' of the mal11 
diaeonal), and FIFO (first-in first-out). 

The Match Heuristic: This heuristic decides what bloc ' 
from the dictionary D best matches image GP (the P onJ ^ 
of the image of .the same shape as b defined by 
currently selecied growing point GP). All experimen 
results reponed here use the greedy heuristic (choose 
largest match possible of acceptable quality, ar.J aroon. 
two matches of equal si 2 e, choose the one of best q 
ity). The parameters that guide the matching process 
The distance measure : we use the standard mean-sq u ^ 
measure in all experiments. The elementary subbloc 
I- lame matches can be divided into subblocks of cons 
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m •The maiched block 
j | •Previously encoded irr.sct 

Fig. 3. “OneRow «f OneColumn" learning heurisiic. 

size / x L and then disiance is computed as the maximum 
distance among the subclocks: this prevents distortion from 
being unacceptable in a small portion of a maich because it 
is better than needed in other areas (all experiments reported 
here use l x l = 4 x 4). Hie type of coverage'. examples 
cf image covering strategies include first coverage where 
the disiance is computed only on the uncovered pan of 
image GP, last coverage where the match is computed for 
the entire block (except if it falls outside the image borders), 
and average coverage (used in all experiments reported 
here) where the match is computed for the entire block as 
for last, but on overlapped areas the resulting value is the 
average value between all the values of matches that happen 
to cover that pixel The threshold i: a real number that 
defines the maximum allowed distance (distortion) between 
imageG P and b. 

The Growing Points Update Heuristic: The growing 
points update heuristic is responsible for generating new 
growing points after each new match is made. For all 
experiments reponed here, the concave comers of the 
paniallv encoded/decoded image are chosen. 

The Dictionary Update H eurtsne: Tne dictionary update 
heuristic adapts the contents of the dictionary D to the 
pan of the image that is currently encoded/decoded. All 
experiments reported here use the OneRow 4- OneColumn 
dictionary update heuristic, depicted in Fig. 3, that adds 
(if possible) twx new blocks to the dictionary, constructed 
by extending the previously matched block (or pan of it) 
vertically and horizontally by one row. 

The Deletion Heuristic: This heurisiic maintains the dic- 
tionary D so it can have a predefined (constant) size 
All experiments reported here use the LRU heuristic (delete 
the entry that has been least recently used). 

Before closing this section, we should report an exper- 
imental finding made after the writing of Constantinescu 
and Storer [4]. Although experiments have shown that the 
basic algorithm is robust over a wide choice of heuristics, 
allowing growth in only one quadrant (as long as possible) 
typically improves compression (by about 10% on average) 


for the same SNR. Because wave growing can ’‘fill" the 
entire image and still satisfy the above restriction, this paper 
has switched from circular (used in Constantinescu and 
Storer [4]) to wave. 

111. K - D Tree Dictionary Data Structure 
The basic algorithm presented in Constantinescu and 
Storer [4], [5] encodes with simple linear search to find 
matches, and is very slow if implemented on a standard se- 
rial architecture (decompression is essentially table-lookup, 
and is quite fast). In this section we present a new algorithm 
based on k-d trees that reduces the search time from 
minutes or even hours to a few seconds on a UNIX 
workstation. 

If we consider each dictionary block b with = m^xni 
pixels as a point in a ^-dimensional space, the problem is 
to find the closest point (best block) to a given point (image 
area imageG?) from a set of points (dictionary of blocks); 
that is. a nearest neighbor search problem (e.g.. Preparata 
[13], Dasarathv [6]). However, the problem has several 
nontrivial peculiarities: First, the dictionary' blocks have 
variable dimension (/r fc ) and variable shape (m* and n b 
can have arbitrary values). Second, the dictionary maintains 
a d\namic set of blocks: in addition to search we need 
insertions and deletions . And third, the “best** block is 
defined by a match heuristic that may use a variety of 
distortion measures that w'ork over a variety of rectangle 
sizes (and there is always a perfect match to the unit 
size). Typically, nearest neighbor algorithms perform time- 
consuming preprocessing in order to have fast processing 
time. This works w-ell if the set of points is static (docs not 
change during processing). However, in our case the set of 
points (dictionary') consists of the alphabet at the beginning 
of encoding, and changes during encoding, on average with 
two insertions and eventually two deletions for each search. 

We have employed a data structure based on k-d trees 
(e.g.. Bentley [1], Bentley and Friedman [2], Overmars 
and van Leeuven [12]). Each branch in the tree relies on 
some discriminating dimension and a partition value. The 
nonterminal nodes contains the (iw'o) pointers to the sons, 
the partition value, and the discriminating dimension (which 
can be data-dependent); terminal nodes (named buckets) 
contains data (dictionary blocks). Because wx are using 
the w-ave growing heuristic, we can assume that a region 
that is being matched is always “attached" to the already 
compressed portion of the image at its upper left comer, 
and we use the upper left 4x4 subblock of the region to 
provide the keys for the search. To find matches that are 
less than 4 pixels in either dimension, we employ a few 
additional trees, as to be discussed shortly. 

A significant difference between our algorithm and Fried- 
man, Bentley, and Finkel [7] algorithm is that we have a 
bound on the allowable distortion (the distortion threshold 
d) before starting the search. So, we can start a range search 
for the “best" block using the distortion threshold to define 
the range (instead of going first for some nearest neighbor 
block, compute the distance r beiwxen this block and the 
query block, and then do a range search backward — the 
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ChestCAT: C*l-sca.n chest im^ge, 51 2 by 512 pixels, 8 
bits per pixel. 

BrainMrSSde: Magnetic resonance medical image that 
shows a side cross-section of a head, 256 by 256 pixels, 
8 bits per pixel; this is the medical image used by Gray, 
Cosman, and Riskin [GCR91]. 

BrainMrTop: Magnetic resonance medical image that 
shows a top cross-section of a head, 256 by 256 pixels, 
8 bits per pixel. 

NASA5: Band 5 of a 7-band image of DonaJdsonville, 
LA; the least compressible of the 7 bands by UNIX 
compress. 

NASA6: Band 6 of a /-band image of DonaJdsonville, 
LA; the most compressible of the 7 bands by UNIX 
compress. 

WomanHat: The standard woman in the hat photo, 512 
by 512 pixels, 8 bits per pixel. 

LivingRoorn: Two people in the living room of an old 
house with light coming in the window, 512 by 512 pix- 
els, 8 bits per pixel. 

FingerPrint: An FBI finger print image, 76S by 768 pix- 
els, 8 bits per pixel; includes some text at the top. 
Handwriting: The first two paragraphs and part of 
the figure of page 165 of Image and Text Comprt 
sion (Kluwer Academic Press, Norwell, MA) written 
by hand on a 10 inch high by 7.5 inch wide piece of 
gray stationary scanned at 128 pixels per inch, 8 bits 
per pixel; approximately 3.2 million bytes. 

Fig. 4. Detcripiion of the images. 

so-called “bounds-overlap-bair lest). If we use the ranee 
i*i - d.Tj -r d) for each dimension ? of the query block 
x (key area), deciding to go left, right, or both w2y$ in 
the k-d tree depending on how this range compares with 
the partition value v, associated with the currently visited 
nonterminal node, we end up by selecting all potential best 
matches (all blocks which meet the disionion threshold on 
the key area), no matter what distortion measure we use 
as long as it is monotonic in dimension values as well 
as in the number of dimensions (conditions required also 
by Friedman. Bentley, and Finkel algorithm). An example 
of such a measure is the standard 12 (Euclidean) metric. 
Although mean-square error does not satisfy this condition, 
it is a bit faster to compute (because there is no square root 
to compute) and works equally we]) in practice. 

Let us now consider the complexity of our algorithm 
when the k-d tree data structure is employed. Encodins 
time is bounded by 

.V{S{D mtx .m) + Q{ N) t m) ^ 

where N is the number of pixels in the image, S(Z? mtx , rn) 
is the maximum time to search a dictionary' with a 
maximum of D m&x entries each with at most m pixels, 
<2(.V) is the time to insen and delete for the crowing 
points queue, and r is the amount of compression (original 
size/compressed size). Straightforward implementation of 
the growing heuristics we have considered uses 0 ( log (.V)) 
lime by employing a heap data structure; however, this lime 



can be reduced to 0(1) by implementing all heuristics in a 
manner similar to FIFO. Under ideal assumptions, it can be 
show-n that the expected time for range search in k-d trees 
is 0(logn + jB), w'here B is the number of blocks found 
(Bentley and Stanat [3], Friedman, Bentley, and Finkel [7]). 
If we take S{D mhx ,m) to be 0(log (D riU )) (which from 
our experiments 'appears to be a reasonable assumption), 
the improved encoding time is 

mw) \ 


under the reasonable assumption that m = 0()og(D JTilx )). 
In many applications, it may be reasonable to assume that 
r is log (LL^x), which brings the encoding time down to 
O(.Y) time. As before, decoding is essentially table lookup, 
2nd can be done in Of.V) time. 

Some parameters of the k-d tree should be adjusted by 
experimentation wdih real data or simulation because they 
refiec: some compromise between time, memory space, 
and retrieval quality that is generally dependent on the 
application domain. After experimenting with a number of 
alternatives we choose the following settings (used for all 
the experiments reported in this paper): 

Bucket Size: Maximum 8 blocks per bucket. (We exper- 
imented with bucket sizes ranging from 1 to 32.) 

Discriminating Dimension : The dimension with the 
largest spread of values (computed by estimating the 
variance on every' dimension of the key. for the 8 blocks 
in the bucket). (We experimented w-jih random choice, and 
with cyclic choice depending on the level in the tree). 

Partition Value: The mean value between all of the 
discriminating dimension values in the bucket. (We ex- 
perimented with random values which worked relatively 
well). 

Range: 1.25 *d. (Even though mean-square error does 
not satisfy the monotone properties discussed earlier, by 
extending the range just a little to Jin, -1.25 j,- -1-1.25 *c/], 

the retrieval quality is as good as for full search with an 
insignificant increase in search time.) 

Number of k-d Trees: Four trees fl. <2. 73. and 14 , w ith 
the following key sizes and block assignment: 

ti has lxl key and contains blocks of size 1 x n or 
n x 1, with n > 2. 

(fi is simply a binary- search tree). 

12 h2S 2x2 key and contains blocks of size 2 x n or 
n x 2, with 7? > 2. 

£3 has 3x3 key 2nd contains blocks of size 3 x n or 
n x 3, with n > 3 and 

£4 has 4x4 key and contains blocks of size m x n, 
with rn. n > 4. 

Regarding the number of trees to use and the key 
sizes, since our algorithm is “normalized*' by using / x l 
elementary areas (l = 4 for all experiments reported here), 
then using a key of size at least / x /, no maner how “good" 
a big block is on the rest, if it does not satisfy the disionion 
threshold on the key area it will be rejected also by the full 
search. Practically, the improvement in selectivity by using 
keys bigger than 4x4 does not justify the increase in the 
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Tabic 1 Companion of Compression Ratios for the Same SNR . PSNR t (Each of these columns OP POOR fUAMTY 

shows the same SNR (the corresponding PSNR is shown m parentheses j the compression achieved by 
our algorithm with the new tree search data structure, our basic full-search algorithm, and by JPEG. 




“Very Good" 


’Good" 


'Fair” 


Quality 
SNR - PSNR) 

Compression 

TREE/FULL/JPEG 

Quality 
SNR (PSNR) 

Compression 

TREE/FULL/JPEG 

Quality 
SNR (PSNR) 

Compression 

TREE/FVLL/JPEG 

ChesiCAT 

59 ^ 56) 

4.3/4.3/3.0 

22 (29) 

8.9/ 8.9/4. 8 

IS (25) 

12.8/ 12.7/6.7 

BrainMR„Side 

2S.5 (39) 

4. 1/4.1/i. 6 

26.5 (37) 

i.S/4.9/6.1 

20.5 (31) 

10.4/10.3/15.8 

BrainMRJTop 

-7 <55) 

2.8/2. 9/2.4 

20.5 (28.5) 

5.7/5.773.9 

15.5 (23.5) 

10.4/10.8/6.6 

NASA5 

30.5 u\) 

4.i/a:/ai 

28 (38.5) 

5.6/S.6/5.9 

26 (56.5) 

7.4/7.5Z8.5 

NASA6 

-6 (51.5) 

22.6/22.8/8.4 

40.5 i.i6.5> 

74.J/80.1/64.7 

39 (45) 

107.8/106.5/65.1 

WomnHAT 

35 -0.5) 

4.0/4. 1 /4.4 

30 (35) 

8.6/8.8/13.7 

27 (52.5) 

14.4/ li.5/22.5 

LivingRoom 

32 <3$> 

3.9/4. 0/4. 5 

27 (35) 

7.4/ 7.5/ 9.1 

24 5 (30.5) 

10.$/ 11.0/ U.3 

FingcrPrim 

32 (35) 

6.2/6.376.5 

24 t27) 

26.5/76.5/27.3 

22 (25) 

37.6/3$. 9/25.0 

HandWriting 

32 >33) 

17,3/17.0/9.5 

24.5 (25.5) 

61.0/60.1/32.0 

17.5 (18.5) 

172.0/177.0/67.3 
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Fig. 5. (a) Handwriting original. (b)HandWrjting JPEG at “0-to-l. (c) Handwriting at 70-to-l 

using rectangles, id) Handwriting at 70-to-l using squares, (e) Tne dictionary for Fig. 5(c). (0 
The dictionary for Fig. 5(d). 
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tree search lime. We experimented with different strategies 
of searching the forest of 4 trees but no one proved 10 be 
significantly better than searching in the order: M. /3. / 2. / I 
where the search goes from one tree to the other only if no 
block was found. 

To evaluate the performance of our algorithm, we used 
the lest images described in Fig. 4. For each lest image. 


we adjusted the threshold to get three compressed files, 
one of very good quality, one of good quality, and one of 
fair quality; the results are shown in Table 1. Although the 
compression obtained is nearly identical with the basic full 
search algorithm, the execution time for a 4 K dictionary 
was about 60 times faster (roughly speaking, we now 
use seconds rather than minutes to encode on a UNIX 
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the “activity'’ in a region of the image as the ratio between 
the variance (to the mean) V and the mean M on this 
region. From experimentation, we can say that if the ratio 
A is smaller than 4%-5%, then the area is smooth and we 
use a smaller distortion threshold of 0.4*4 for this area; 
if 5% < A < 10% we use an intermediary threshold of 
0.6*4, and if .4 > 10% than the area is active and we use 
the entire threshold 4. Figure 6(a) shows our algorithm on 
the WomanHat image, using a constant distortion threshold 
at lO-io-l compression. Figure 6(b) shows the results of 
the method described above at 10-to-l compression. For 
comparison, Fig. 6(c) shows JPEG at 10-to-l compression. 
Similarly, Fig. 7(a)-(c) shows the ChestCAT image using 
constant distortion threshold at 10-to-l compression, the 
method described above at 10-to-l compression, and JPEG 
at 10-to-l compression. In both Figs 6(b) and 7(b). the 
visual quality is much improved (especially on smooth 
areas such as the shoulder in the WomanHat image and 
the smooth pan with the "A"’ in the ChestCAT image). By 
comparison, note that in Fig. 7(c) JPEG is blocky and the 
edses are not preserved: however, for WomanHat, Fig. 6(b) 
and (c) has similar visual quality. 

vi. Current Research 

We are currently working on a number of extensions 
to the basic approach presented in this paper. First we 
are continuing experiments to better understand how' 
different heuristics affect performance in terms of both 
speed and quality. Second, parallel algorithms that run 
in nearly 0{\fS) time with 0{\fS) processors are 
possible. Third, of interest are formal proofs addressing 
compression-fidelity tradeoffs. 
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Split-Merge Video Displacement Estimation 

BRUNO CARPENTER! AND JAMES A. STORER 


Invited Paper 


Motion Compensation is one of the most effective techniques 
used in interframe data compression. In this paper we present a 
parallel block-matching algorithm for estimating interframe dis- 
placement of blocks with minimum error . The algorithm is designed 
for a simple parallel architecture to process video in real time . The 
blocks may have variable size and shape depending on a split-and- 
merge technique. The algorithm performs a segmentation of the 
image into regions (objects) moving in the same direction and uses 
this knowledge to improve the transmission of the displacement 
vectors . This segmentation identifies the part of the frame “active" 
with respect to the previous frame and preserves some of the spatial 
correlation between blocks. 

L Introduction 

Data Compression is essential for the storage and trans- 
mission of digital video, where large amounts of data 
must be handled by devices with a limited bandwidth. 
For example, digital High Definition Television (HDTV) 
requires more than 1 billion bits per second in uncom- 
pressed form. Knowledge of motion or displacement of 
groups of pixels in successive frames can be the basis of 
video compression algorithms and can be used in addition to 
other classical single-image compression techniques, such 
as transform, interpolation, and quantization algorithms, to 
greatly reduce the amount of data transmitted. Here we 
will restrict our attention to the translational component of 
the motion and refer to the algorithms that compute the 
trajectory information of a pixel or a block of pixels as 
displacement estimation algorithms. 

Block-Matching Displacement Estimation Algorithms di- 
vide a frame into a number of rectangular blocks and 
compute a displacement vector for each block by correlating 
the block with a search area in the previous frame; sec Jain 
and Jain [4] # Koga et a /. [7], Srinivasan and Rao [9). 

In this paper we present a real-time parallel algorithm 
for displacement estimation using a two-dimensional grid 
architecture and then show how the algorithm can be 
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implemented on a pipe. The algorithm is based on a block- 
matching approach to the problem and uses a split-and- 
merge technique: the blocks ( superblocks ) have a variable 
size that is determined at each step of the algorithm from 
the previous step and the input data. In fact, the algorithm 
performs a segmentation of the image into areas moving in 
the same direction and uses this knowledge to improve the 
transmission of the displacement vectors of the elementary 
blocks. 

In the next section we outline the sequential fixed- 
size block displacement estimation algorithm presented in 
Jain and Jain [4], In Section HI we present our new 
algorithm. Section IV is devoted to its analysis. Section V 
discusses experimental results. Section VI outlines how the 
segmentation operated by the Split-Merge technique can be 
the basis of a full video coder. In Section VII we present 
our conclusions. 

n. Image Coding and Displacement Estimation 

In this section we review the fixed-size block displace^ 
ment estimation algorithm proposed by Jain and Jain [4], 
This algorithm and its assumptions have been a guideline 
for more recent work in the field, similar approaches 
are taken by Koga et al . [7], Srinivasan and Rao [9], 
Kappagantula and Rao [5], Puri et a!. [S], and Ghanban 

[ 3 ]. 

In a typical displacement estimated image coding algo- 
rithm the frame is segmented into blocks. For each block 
a displacement vector is computed and sent to the decoder; 
moreover, the encoder computes the difference between 
the the original frame and the frame that the d eco de r 
could reconstruct from the displacement vectors, and sends 
this difference image to the decoder. All data sent from 
the encoder to the decoder may eventually go through an 
additional entropy coding phase. 

A . Displacement Estimation 

The algorithm proposed in Jain and Jain [4] segments 
an image into fixed-size small rectangular blocks, each 
block assumed to be undergoing independent translation. 
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]f these areas are small enough, rotation, zooming, etc., of 
larger objects can be closely approximated by piecewise 
translation of these smaller areas. The goal is to approxi- 
mate interframe motion by piecewise translation of one or 
more areas of a frame relative to a reference frame. Let 
U be an M x .V size block of an image and U T be an 
(M -i-2 p) x (.V -r 2p) size area of a reference (neighboring) 
image, centered at the same spatial location as U n where p 
is the maximum displacement allowed in either direction 
in integer number of pixels. The algorithm requires for 
each block a search of the direcrion of minimum distortion 
(DM D). i.e.. of the displacement vector that minimizes 
a given distortion function. A possible mean distortion 
function between U and U r is defined in Jain and Jain [4] as 


D{i.j) 


1 

M . V 


>/ .v 

Yl 9 ^ ,n - n 

rr, ss 3 n = 1 


u r (m -T i.n +j))). 
~P< '-J < P 


where g{x) is a given positive and increasing distortion 
function of z. The direction of minimum distortion is given 
by (•/. j). such that D(i.j) is minimum. 

One problem of this approach is that finding optimal 
displacements requires the evaluation of D(i.j) for (2p-r 
1) x (2p-r 1) directions' per block. For example, even for 
motions up to 5 pixels along either side of the axes a 
search of 121 positions per block is required. The solution 
proposed in Jain and Jain [4] is to assume that the data are 
such that the distortion function monotonically increases 
as we move away from the DMD along any direction in 
each of the four quadrants. Tnis assumption makes possible 
a search procedure for the DMD that is an extension in 
two dimensions of the standard logarithmic search in one 
dimension (see Knuth [6]). 

In the next section we present a parallel algorithm that 
eliminates the need for this assumption and which can 
be implemented to am on-line on a practical parallel 
architecture. 


III. A Split-Merge Parallel 
Block-Matching algorithm 

In this section we present a new parallel block-matching 
algorithm for displacement estimation based on a split-and- 
merge technique taking advantage of the fact that groups 
of blocks often move in the same direction (for instance, if 
they are part of the same object or pan of the background). 
The encoding algorithm computes the displacement vec- 
tors (in parallel) and sends them in compact form to the 
decoder. The decoder receives the data and constructs an 
approximate version of the image, which will be corrected 
in the next step of the general encoding algorithm. 

A. The Model of Compulsion 

To process frames of n pixels each, the encoding algo- 
rithm employs a \fS x \TS grid of processors. 1 < .V < n, 
each having 0{n/N) local memory. Although all of what 
we present is well defined when .V <g; n, io simplify our 



presentation we shall assume .V = kn for some 0 < k < 1 
(and here each processor has 0(1) local memory). For 
decoding we will need only a single processor with 0(n) 
memory'. 

Each frame is divided into .V rectangular blocks num- 
bered in the same way as the processors; we assume that 
at time t processor i receives as input block i from the tth 
frame. Since each processor corresponds to a block, and 
vice verso . from now on we will use the terms processor 
and block interchangeably. 

The encoding algorithm implies the use of a sequential 
controller to monitor the execution of ihe algorithm. The 
controller will need 0(.V) dynamic memory and will 
perform communication operations only with processor 1. 
We will identify this controller with processor 1 itself 
by allocating to this processor an additional 0(.V) local 
dynamic memory'. The encoder computes the displacement 
vectors and transmits them in a compact form to the decoder 
on a serial line. Figure 1 depicts our model of computation. 
The input frames come to the frame buffer on a high-speed 
communication line, in time proportional to n. The data 
flow from the frame buffer to the grid architecture that 
performs the search of the optimal displacement for each 
block. The communication between the frame buffer and 
the grid architecture has to be performed fast enough to 
allow the grid time to perform the necessary computation 
on the actual frame before receiving the next frame. In 
fact, the bold arrow implies that this communication should 
be performed either in parallel or on a serial line with 
a speed of cn/N pixels per unity of time, where c is a 
system-dependent constant. In Fig. 2 is shown a possible 
implementation of the frame buffer: embedded into the grid. 
The input is pipelined through the processors. At each step 
each processor can pass the input to its neighbor and, when 
necessary', can simultaneously copy it into its own working 
memory. 

B. The Encoder Algorithm 

Figure 3 shows the encoder algorithm at time t. Each 
processor at time t computes in parallel the displacement 
of the block that it represents (in frame t) with respect to a 
search area in frame t - 1. For simplicity we assume that 
the size of the search area is exactly 3x3 blocks, that is, 
for each processor we limit the search area to its adjacent 
blocks. Processor i at time t keeps the description of the 
block it represented at time t — 1 in the variable block p f(?) 
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Fig. A frame buffer implemeniaiion. 

(the subscript pf is short for "previous frame"). If at time 
i — 1 a number of pair-vise adjacent blocks have the same 
displacement vector, then at time t they are considered to 
be a well-defined superblock : superblock(i) where i is the 
leader of the superblock (the processor with minimum ID). 

If they continue to move together in the same direction 
at time i. just a single displacement vector for the whole 
superblock is sent from the encoder to the decoder. Each 
processor is not aware of the shape of the super block to 
which it belongs, but is aware of the adjacent processors 
that move in its direction (coblock). The union of coblocks 
for adjacent processors that move in the same direction w ill 
define a supcrblock. At each time 1 the algorithm can b- 
divided into three sieps. In Phase 1 (ike compute phase) 
the displacement vector for each single block is computed 
and each processor compares its displacement with the 
displacement of the adjacent processors. Each processor i 
keeps a list of the adjacent processors that move in its same 
direction: cnblock(i). In Phase 3. whenever this is possible, 
these lists will be merged together into superblocks. At 
time t - t 1 a single displacement vector will be sent from the 
encoder to the decoder for all the processors in a superblock 
that still move in the same direction. 

In Phase 2 (the split- and- send phase) processor 1, the 
processor in the upper left comer of the grid, becomes the 
controller and communicates with the others processors: 
Catherine information on their displacement vectors, de- 
ciding their belonging to a superblock or the occurrence 
of a split, (i.e., whereby processors leave a superblock 
because- their motion differs from that of the majority. 
We address the complexity of this operation in the next 
section.). Beins aw-are of all the displacement vectors for 
the N processors, the controller, for each superblock tests 
if spliis have occurred and constructs the hsi-Oj-spliis. i.e.. 
the list that specifies which processors that were assumed 
to be pan of a supcrblock are no longer pan of it because 


Butt v itom Pt : TL 


for each processor i in parallel do: 
begin 

1 ) for every adjaccni processor neighbor 

do get block f (neighbor) 

2) COMPUTE (full search) iu own displacement vector yi (i) 

3) for every adjacent processor neighbor do get ^.(neighbor) 
A) set coblock(i) * set of adjacent processors neighbor such 

that (neighbor) « v> (t) 


end 


Ph*tf 2: fgPJ .1T and SEND) 


controller do 
begin 

1) for i » 1 to S do ... 

get vl *{i) and store it into the record representing i tn 
the superblock to which i belongs 

2) for each superblock do 

begin 

2.1) SPLIT the superblock into group* of processors,/ 

having the same displacement. vector ' 1 0) 

2.2) let prnin be the processor with minimum ID in the larger group 

2.3) »r pmin is not the leader of the current superblock then maht a 

new supcrblock wjth leader prnin and displacement vector vi {pmin) 
2 A) add the other groups to list -of* splits 
2.5) If pmin is not the leader of the current superblock then 
delete the current supcrblock 

3) if length (lisi -of- splits ) > T then for i =1 to fit do SEND v^(i) 

else 

begin 

SEND lisi-of-spli'J 

for each supcrblock (i ) do 

SEND vf {superblock (/ )) 

end 

end 

Ph?tr V 

controlier do 
begin 

1 ) Tor / = 1 to fif do get coblock (/ ) . 

2) from the coblock s at time r conSTun the new super blocks at time t *1 
end 


Fig. 3. The algorithm at time 1 . 


ihcv arc now moving in a different direction with respect 
to the rest of the superblock. If the length of li si -of -splits is 
less than a threshold T . then the controller sends the to- 
of- splits and the displacement vectors for the superblock s; 
otherwise, it sends the displacement vectors for each single 
block. The threshold monitors the efficiency in terms of 
amount of data sent to the decoder of sending both list-of- 
spliis and the superblocks displacement vectors, instead of 
the displacement vectors for each single block. 

In Phase 3 the supcrblock s at time i -r 1 are built 
from the coblocks. The. encoder and the decoder maintain 
dynamically a list of the superblocks , i.e., of which block 
beloncs to which supcrblock. No communication beiween 
the encoder and the decoder is needed to maintain the 
description of the superblocks: the decoder has enough 
information to compute the shape of the superblocks at time 
tri-1. At every time t a list of the positions in which a split 
has occurred is sent from the encoder to the decoder: in this 
way the decoder is able to decode the displacement vectors 
sent from the encoder. 


C. The Decoder Algorithm 

The decoder receives at time t the information sent from 
the encoder during Phase 2. It has computed at time t - 1 
the superblocks at time i and therefore it can assign to 
each block the correspondent displacement vector. Finally, 
it has enough information to compute which blocks will be 
in which superblocks at time t 4* 1. The decoder is not a 
parallel machine: one single processor suffices to perform 
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ihe necessary operations. The decoder uses O(.V) memory' 
to decode each frame in 0(X) time. 

D . Splits and Displacement Veciors 

One of the critical points of the algorithm is the commu- 
nication from ihe encoder io the decoder of the list-of-splits , 

1. e., of the list of the processors that at time t belonged to 
a superblock but no longer do, and of their displacement 
veciors. There are two requirements that the list -of- splits 
must satisfy: it must be computationally easy to build, and 
it must have a concise encoding; otherwise, sending only 
one displacement vector for each superblock would not be 
convenient because of the necessity of sending also the 
lishof splits. 

The list-of-splits is dynamically built: In line 2.4 of Phase 

2. groups of processors are added to the list, a single 
displacement vector per each group. We keep a hash table 
of the possible displacement vectors: each time a group 
is added to the list we compute the hash value of its 
displacement vector and we associate to the corresponding 
entry in the table this displacement vector and the list 
of the processors in the group . This list begins with the 
3D of the smaller processor, then the ID's of the other 
processors follow', each coded in terms of the displacement 
with respect to the previous one. Because the processors 
were pan of the same superblock and are still moving in the 
same direction, w ; e can expect their ID numbers to be very 
close and we can get good compression w’ith this simple 
heuristic. When the encoder sends the Jist-of-splits . it sends 
each nonzero entry in the table. 

There might be more than one solution to the computation 
in Line 2 of Phase 1. The block examined could match 
optimally more than one block in the search area, or else 
w'e may want to consider in the next Phase more than 
one direction in which the block can move, in such a 
w’ay to have more options when it is time to shape the 
superblocks. A w-ay to do this is to save for each block 
all the displacement vectors that allow' an error less than a 
threshold t wtien the block is matched in the search area. 
In this case, in line 1 of Phase 2, the processor sends to 
the controller not only a single vector but a list of possible 
vectors. 

To determine the eventuality of a split, in line 2.1 of Phase 
2, the controller shall compute in which of the possible 
directions the majority of the processors move. The number 
of possible directions is finite and the computation can 
be limited in advance by limiting the length of each list 
of possible vectors to an appropriately chosen constant 
L. Phase 3 is not affected by considering more than 
one displacement value per vector in Phase 2: a single 
displacement vector per block has been sent in Phase 2, 
and now only that vector has to be considered in Phase 3. 

E. Implementation on a Pipe 

Figure 4 show's how the algorithm can be implemented 
on a pipe. The inputs to the pipe are the actual frame and 
the previous frame reconstructed by the decoder. The input 
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flow's in linear time through all the processors. Each proces- 
sor has to construct the search area by using the information 
from the previous frame: after 0(.Y) time every processor 
has available both the block it is representing at the current 
time and ihe search area in the previous frame. 

The computation involved and the details of the algorithm 
are analog to the grid implementation. 

IV. analysis of the algorithm 

In this section we analyze the encoder algorithm in terms 
of complexity, fidelity, and compression. The analysis is 
done for the grid implementation, similar arguments hold 
for the pipe implementation. 

A. Complexity 

Let .V be the number of processors in the grid, w-here 
y = kn for 0 < k < 1. In Phase 1 lines 1 and 3 involve 
direct neighbor communication and take constant lime. The 
computation involved in line 2 is the most expensive pan of 
Phase 1, but it still takes constant time, where the constant 
depends on the size of the search area. The for loop in line 
1 of Phase 2 might seem to involve 0(N 2 ) communication 
on a grid architecture: processor 1 has to interact with all 
the other processors. If w'e number the blocks by row and 
column this for loop can be easily pipelined as showed in 
Fig. 5. Therefore, processor 1 will always interact at each 
iteration of the loop with an adjacent processor: processor 
2, and the loop w'i]] iake 0(A 7 ) time. The complexity 
of line 2 (2. 1-2.5) depends on the number of processor 
ID's examined. The superblocks are pairwise-disjoint sets; 
therefore, line 2 has a time complexity of 0(N). Line 3 
involves also 0(N) time. 

The f cr loop of line 1 of Phase 3 can be pipelined and 
takes 0(N). For each vector the coblocks have a constant 
size (each processor has at most eight neighbors), therefore, 
line 2 has time complexity 0(N). 

In fact, the whole algorithm has at each step t a time 
complexity 0(.V) = 0(kn) % i.e., linear in the size of the 
input, it is an on-line algorithm. 
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Fig. 5. Pipelining. 

Each processor, with the exception of the controller, 
needs a constant amount of memory. The controller needs 
0(. Y) dynamic memory to represent the super blocks and 
to store the coblocks and the displacement vectors. 

B. Fidelity 

The displacement vectors computed by our algorithm are 
at least as accurate and generally more than those computed 
by the sequential algorithm: we do not assume any a priori 
hypothesis to simplify the search, rather we search all the 
possible directions. 

C. Compression 

The amount of data sent from the encoder 10 the decoder 
in our algorithm is in the worst case equal to the amount 
sent by the fixed-size block algorithm, but the algorithm 
has the possibility to transmit much less cat 2 . 

The size of the blocks is chosen in such a wav 2 s to 
approximate different movements of an object by piecewise 
translation of the blocks themselves. An object may be 
composed of a large number of blocks, all of which move 
in the same direction, even in the case when the motion 
in the sequence is due to a movement of the camera. If 
neighboring objects move in the same direction with the 
same speed, they will belong to the same superblock. In 
fact, a simple but important case is when large groups of 
pixels comprise “Background** scenery that stays relatively 
constant from one frame to the next. 

The superblocks will generally consist of many proces- 
sors, the length of the tisi-of-spliis will be negligible with 
respect to the cardinality of the superblocks , and at each 
time ; a sensible reduction in size of the data sent from 
the encoder to the decoder is expected. However, when the 
length of the list- of splits becomes bigger than the threshold 
7\ the controller acts as in the fixed-size algorithm and sends 
to the decoder one displacement vector per block, starting 
from v f ( 3 ) to f T (7/) s instead of sending list-of splits and 
the displacement vectors for the superblocks. In this way. 
it sends the same amount of data that the algorithm in Jain 



Fig. 6. First and last frame of the sequence “Salesman " 



Fig. 7. First and last frame of the sequence “Fog." 



Fig. 8. First and last frame of the sequence "Kids." 



Fig. 9. First and last frame of the sequence "Mountains." 



Fig. 10. First and last frame of the sequence “Pastorale." 

and Jain [4] would have sent. The decoder can infer that the 
displacement vectors received refer to the blocks, and not 
to the superblocks , from the number of vectors received. 

V, Experimental Results 
We have performed experiments with the following data 
set (Figs. 6-10 show the first and the last frame of each 
of these sequences): 

Salesman 

This sequence is one of the standard test sequences in 
video compression. It is currently available for anonymous 
ftp at ipl.rpi.edu and consists of 448 frames. 360 x 288, 8 
bits per pixel. It contains relatively little detail or motion. 
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typical of the head 2 nd shoulaci sequences common in 
video-telephone applications. 

Fog 

From the motion picture “Casablanca/’ the final scene 
when Humphrey Bogart and Ingrid Bergman say good-bye 
in the fog at the airport. This sequence is composed of 60 
frames, 152 x 114, 8 bits per pixel, digitized at a rate of 
12 frames per second. There is a considerable amount of 
noisy movement due to the foggy background. 

Kids 

From the motion picture “It's a Wonderful Life,** it is 
one of the first scenes, where kids (the main characters as 
children) are sitting at a desk. This sequence is composed 
of 100 frames. 152 x 114, 8 bits per pixel, digitized at 
a rale of 12 frames per second. There is a fair amount of 
movement due to the presence of three characters. 

Mountains 

From the motion picture "The Sound of Music.” one of 
the final scenes, where the main characters are walking in 
the mountains. This sequence is composed of 60 frames, 
152 x 114, 8 bits per pixel, digitized at a rate of 12 
frames per second. The scene involve a noticeable amount 
of movement. 


Pastorale 

From the motion picture “Fantasia," 2 scene from the pan 
of the movie illustrating Beethoven's 6th Syimphonv. This 
sequence is composed of 60 frames, 152 x 1 14, 8 bits per 
pixel, digitized at a rate of 3 2 frames per second. 

We define, as usually, the SNR correlation (in decibels), 
between two frames X and Y % of dimension \f x .V as 


SNR(.Y.y) = 10 x log 1 q 


i<M.j<S 

£ (.Y(.\i)-y(i.j)) 3 ' 
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To describe the amount of movement present in each of 
the test sequences. Fig. 11 presents for each sequence the 
SNR correlation between pair of consecutive frames. On the 
Y axis we plot the SNR correlation, in decibels, between 
a frame and the previous one, on the X axis the frame 
number. We can see, for example, that in the sequence 
“Kids” and in the sequence “Mountains” (Fig. 11(c), (d)) 
there is at first a higher amount of movement (the first 20 
frames of “Kids” and the first 30 of “Mountains”), and then 
a lower amount of motion. Therefore, the graphs show very 
low points for the first part of the sequence and then a brisk 
increase and a smoother behavior. In the sequence “Kids,” 
this is due to the fact that in the first 20 frames the blonde 
girl moves from the left comer of the picture and sits down 
at the desk while the boy gets closer, then in the rest of the 
sequence the two girls and the boy move slightly and chat. 
In the sequence “Mountains," at the beginning people are 
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Fig. 12. Comparison with the sianczrd full search, fixed-size 
block, algorithm. 


walking fast to the top of the hill but at the end ihey slow 
down and turn to the mountains. 

Figure 12 show's, in a table, the results we have ob- 
tained comparing our algorithm to the standaid full search 
algorithm. The first column of the table identifies the 
sequence, the second column repons for each sequence 
the average SNR (in decibels) between consecutive frames 
as a measure of their correlation. The third and founh 
columns present the results of the comparison betw-een the 
full search algorithm and the Split-Merge algorithm for the 
test sequences. We have run the full search algorithm with 
block size 8 (8 pixels by 8 pixels blocks) and block size 
4 (4 pixels by 4 pixels blocks) and we have reported in 
the first subcolumns of the third and fourth columns the 
average SNR between the original frames and the prediction 
obtained. Then we have run our algorithm seiting the 
parameters in such a way to achieve that same average SNR 
and in the second subcolumns we have compared the size 
of the predictions, i.e., the number of bytes needed to send 
the prediction from the encoder to the decoder assuming no 
lossless compression is performed. 

As can be seen in Fig. 12, for the same SNR, our 
algorithm has in general a noticeable saving in size respect 
to the full search algorithm. In the sequence “Fog” the 
foggy background produces noisy effects on the segmenta- 


CaRPENTIERI AND STOKER: SPLIT- MERGE VIDEO DISPLACEMENT ESTIMATION 


MCHNAL PAGE fS 
CT TOOK QUALfTY 


945 






Fig. 13. Sequence “Kids,*' comparison of Full Starch block sire 
8 (a) and Split Merge initial block size 4 (b). X = frame number, 
V = SNR <dB). 



Fig. 14. Segmentation of the frames “Salesman** 100. 200, 200, 

447 into supcrblocks. the initial block si 2 c is 4. 

tion performed by the Split-Merge algorithm, those effects 
axe particularly relevant when we use a very small initial 
block size (2 pixels by 2 pixels). This is why the Split- 
Merge algorithm outperforms the full search algorithm in 
all experiments but in the case of the sequence “Fog** and 
initial blocksize 2. 

While our analysis has been done in terms of average 
SNR. it is true that the algorithm performs equally well 
on a frame-by-frame basis with respect to the full search 
algorithm. For example, for the sequence “Kids.” Fig. 13 
shows the SNR values, frame by frame, obtained by the 
full search algorithm, with block size 8, and the values 
obtained by the Split-Merge algorithm with initial block 
size 4 and parameters set to achieve the same average 
SNR as the full search algorithm. This is true also for all 
the other sequences tested. On a frame-by-frame basis, the 
Split-Merge algorithm behaves almosi exactly like the full 
search algorithm. 

VI. Splits and video Coding 

This technique suggests a complete video compression 
algorithm based on the different levels of action that are 
generally present in a video scene, identified as “splits” and 
“superblocks.” In fact, the segmentation of the frames into 
superblocks and splits can be the kernel of a complete video 
compression system. The locations of the splits identify the 
pans of the frame ‘’active” with respect to the previous 
frame while the segmentation into superblocks preserves 


some of the spatial correlation between blocks and avoids 
some “squaring” effects in the predicted frame. 

In Carpentieri and Storer [2] we have presented a video 
coder based on this Split-Merge displacement estimation 
technique. The video coder uses the splits and supcrblocks 
information to improve the error correction module: two 
different thresholds are used to determine if a block needs 
to be corrected, depending on the block being a split or 
belonging to a superblock: this would not be possible by 
using the fixed block displacement estimation algorithm 
which has no notion of spatial correlation between blocks 
or of active pans of the frame. 

Figure 14 shows a segmentation of four frames from the 
sequence “Salesman,” into superblocks; the initial block 
size is 8. Blocks belonging to the same suptrblock have 
the same tone of gray. The splits are depicted by blocks 
having alternating sequences of black and white pixels. 

Tne splits in Fig. 14 correspond to the pans of the scene 
that are active in the transition between the previous frame 
and the actual frame. In fact they are concentrated in the 
portion of the picture relative to the head of the salesman, 
to his right hand, and to the object in his hand. 


vii. Conclusion 

We have presented a new on-line parallel algorithm for 
displacement estimation based on the block-matching ap- 
proach, as well 2 $ an on-line parallel implementation of this 
algorithm. At each time ; both the decoder and the encoder 
have available the description of the superblocks computed 
at lime t - 1. Each superblock is a set of contiguous 
blocks that move in the same direction. The partition of 
the image into superblocks corresponds to an approximate 
segmentation of the image into areas (objects) that move 
in the same direction. Tne quality of the approximation 
depends on the granularity chosen (i.e., the size of the 
blocks and the setting of the internal parameters). Our 
algorithm uses this knowledge of the segmentation of the 
frames to optimize the transmission of the displacement 
vectors. 

Segmenting frames into superblocks presenes the spatial 
correlation between the blocks in the superblock. This may 
improve ihe visual quality of the prediction. Because the 
splits represent blocks that are in a certain sense “new” 
with respect to the previous frame, a different degree of 
correction accuracy can be used for blocks that are splits. 
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